Kenneth Tay
Oct 22, 2019
readrmaps packagemaps package contains a lot of outlines of continents, countries, states, and countiesggplot2’s map_data() function puts these outlines in data frame format, which then allows us to plot them with ggplot()county_data <- map_data("county")
CA_data <- county_data %>% filter(region == "california")
head(CA_data)##        long      lat group order     region subregion
## 1 -121.4785 37.48290   157  6965 california   alameda
## 2 -121.5129 37.48290   157  6966 california   alameda
## 3 -121.8853 37.48290   157  6967 california   alameda
## 4 -121.8968 37.46571   157  6968 california   alameda
## 5 -121.9254 37.45998   157  6969 california   alameda
## 6 -121.9483 37.47717   157  6970 california   alameda
County outlines are drawn using geom_polygon.
coord_quickmap() preserves the aspect ratio of the map.
ggplot(data = CA_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, group = group)) + 
    coord_quickmap()## # A tibble: 3 x 2
##   County  Drought_percent
##   <chr>             <dbl>
## 1 alameda             100
## 2 alpine              100
## 3 amador              100
##        long     lat group order     region subregion
## 1 -121.4785 37.4829   157  6965 california   alameda
## 2 -121.5129 37.4829   157  6966 california   alameda
## 3 -121.8853 37.4829   157  6967 california   alameda
Our drought data and mapping information are in different datasets!
Sometimes our data are spread across different datasets, making it difficult to answer some questions.
Question: Who scored the highest in English in each class?
##     Name Class
## 1 Andrew     A
## 2   John     B
## 3   Mary     A
## 4   Jane     B
##     Name Subject Score
## 1   John English    76
## 2 Andrew English    66
## 3   John    Math    85
## 4   Mary English    71
dplyr)dplyr)##     Name Class Subject Score
## 1 Andrew     A English    66
## 2   John     B English    76
## 3   John     B    Math    85
## 4   Mary     A English    71
## 5   Jane     B    <NA>    NA
dplyr)Question: Who scored the highest in English in each class?
library(dplyr)
bio %>% left_join(scores, by = "Name") %>%
    filter(Subject == "English") %>%
    group_by(Class) %>%
    top_n(1, Score)## # A tibble: 2 x 4
## # Groups:   Class [2]
##   Name  Class Subject Score
##   <chr> <chr> <chr>   <dbl>
## 1 John  B     English    76
## 2 Mary  A     English    71
## # A tibble: 3 x 2
##   County  Drought_percent
##   <chr>             <dbl>
## 1 alameda             100
## 2 alpine              100
## 3 amador              100
##        long     lat group order     region subregion
## 1 -121.4785 37.4829   157  6965 california   alameda
## 2 -121.5129 37.4829   157  6966 california   alameda
## 3 -121.8853 37.4829   157  6967 california   alameda
Our drought data and mapping information are in different datasets!
Solution: Join the datasets together.
combined_data <- CA_data %>%
    left_join(drought_data, by = c("subregion" = "County"))
head(combined_data)##        long      lat group order     region subregion Drought_percent
## 1 -121.4785 37.48290   157  6965 california   alameda             100
## 2 -121.5129 37.48290   157  6966 california   alameda             100
## 3 -121.8853 37.48290   157  6967 california   alameda             100
## 4 -121.8968 37.46571   157  6968 california   alameda             100
## 5 -121.9254 37.45998   157  6969 california   alameda             100
## 6 -121.9483 37.47717   157  6970 california   alameda             100
Map the fill attribute of geom_polygon to the Drought_percent column.
ggplot(data = combined_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, 
            group = group, fill = Drought_percent)) + 
    coord_quickmap()Use scale_fill_distiller to define a more appropriate color scale.
ggplot(data = combined_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, 
            group = group, fill = Drought_percent)) + 
    scale_fill_distiller(palette = "YlOrRd", direction = 1) +
    coord_quickmap()
 Optional material
Inner join: Matches pairs of observations with equal keys, drops everything else. Hence, only keeps observations which appear in both datasets.
After matching pairs of observations with equal keys…